A new estimator of the discovery probability.

نویسندگان

  • Stefano Favaro
  • Antonio Lijoi
  • Igor Prünster
چکیده

Species sampling problems have a long history in ecological and biological studies and a number of issues, including the evaluation of species richness, the design of sampling experiments, and the estimation of rare species variety, are to be addressed. Such inferential problems have recently emerged also in genomic applications, however, exhibiting some peculiar features that make them more challenging: specifically, one has to deal with very large populations (genomic libraries) containing a huge number of distinct species (genes) and only a small portion of the library has been sampled (sequenced). These aspects motivate the Bayesian nonparametric approach we undertake, since it allows to achieve the degree of flexibility typically needed in this framework. Based on an observed sample of size n, focus will be on prediction of a key aspect of the outcome from an additional sample of size m, namely, the so-called discovery probability. In particular, conditionally on an observed basic sample of size n, we derive a novel estimator of the probability of detecting, at the (n+m+1)th observation, species that have been observed with any given frequency in the enlarged sample of size n+m. Such an estimator admits a closed-form expression that can be exactly evaluated. The result we obtain allows us to quantify both the rate at which rare species are detected and the achieved sample coverage of abundant species, as m increases. Natural applications are represented by the estimation of the probability of discovering rare genes within genomic libraries and the results are illustrated by means of two expressed sequence tags datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Presentation a new Estimator for Estimating of Population Mean in the Presence of Measurement error and non-Response

Introduction According to the classic sampling theory, errors that are mainly considered in the estimations are sampling errors.  However, most non-sampling errors are more effective than sampling errors in properties of estimators. This has been confirmed by researchers over the past two decades, especially in relation to non-response errors that are one of the most fundamental non-immolation...

متن کامل

A New Ridge Estimator in Linear Measurement Error Model with Stochastic Linear Restrictions

In this paper, we propose a new ridge-type estimator called the new mixed ridge estimator (NMRE) by unifying the sample and prior information in linear measurement error model with additional stochastic linear restrictions. The new estimator is a generalization of the mixed estimator (ME) and ridge estimator (RE). The performances of this new estimator and mixed ridge estimator (MRE) against th...

متن کامل

A New Estimator of Entropy

In this paper we propose an estimator of the entropy of a continuous random variable. The estimator is obtained by modifying the estimator proposed by Vasicek (1976). Consistency of estimator is proved, and comparisons are made with Vasicek’s estimator (1976), van Es’s estimator (1992), Ebrahimi et al.’s estimator (1994) and Correa’s estimator (1995). The results indicate that the proposed esti...

متن کامل

The Zografos–Balakrishnan-log-logistic Distribution

Tthe Zografos–Balakrishnan-log-logistic (ZBLL) distribution is a new distribution of three parameters that has been introduced by Ramos et el. [1], and They presented some properties of the new distribution such as its probability density function, The cumulative distribution function, The  moment generating function, its hazard (failure) rate function, quantiles and moments, Rényi and Shannon ...

متن کامل

A New Exponential Type Estimator for the Population Mean in Simple Random Sampling

‎In this paper‎, ‎a new estimate of exponential type of auxiliary information to help simple random sampling without replacement of the finite population mean is introduced‎. ‎This new estimator with a few other estimates using two real data sets are compared with the mean square error‎.

متن کامل

An Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods

Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Biometrics

دوره 68 4  شماره 

صفحات  -

تاریخ انتشار 2012